### **Combinational vs. Sequential Logic**



### **CMOS Circuit Styles**

- Static complementary CMOS except during switching, output connected to either V<sub>DD</sub> or GND via a low-resistance path
  - high noise margins
    - full rail to rail swing
    - $-V_{OH}$  and  $V_{OL}$  are at  $V_{DD}$  and GND, respectively
  - low output impedance, high input impedance
  - no steady state path between V<sub>DD</sub> and GND (no static power consumption)
  - delay a function of load capacitance and transistor resistance
  - comparable rise and fall times (under the appropriate transistor sizing conditions)
- Dynamic CMOS relies on temporary storage of signal values on the capacitance of high-impedance circuit nodes
  - simpler, faster gates
  - increased sensitivity to noise

## Static Complementary CMOS

Pull-up network (PUN) and pull-down network (PDN)



#### PUN and PDN are dual logic networks

© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

**Combinational Circuits** 

### **Threshold Drops**

 In a CMOS circuit, PUN is always constructed with PMOS transistors, PDN is always constructed with NMOS transistors. Why? Can we do it in an opposite way?
 Ex: If we exchange the NMOS and PMOS transistors in a CMOS inverter, can we use it as a buffer?

 $\rightarrow$  Not a good design due to threshold drops of NMOS and PMOS transistors.



CMOS inverter: exchange NMOS and PMOS Out=In, a buffer?

© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

#### **Threshold Drops**

□ Fact: NMOS transistors can pass a strong "0", but a weak "1". PMOS transistor can pass a strong "1", but a weak "0".

□ In order to get strong "0" (Gnd) and strong "1" ( $V_{DD}$ ) at output, output should connect to Gnd via NMOS transistors (for strong "0"), and should connect to  $V_{DD}$  via PMOS transistors (for strong "1").



### **Construction of PDN**

 NMOS devices connected in series implement "AND" function. However, in a CMOS circuit, when NMOS portion is turned "ON", it connects output to Gnd (instead of Vdd). Thus the actual function implemented by NMOS devices in series in a CMOS circuit is a "NAND" function.



Similarly, NMOS devices in parallel implement "OR" function. Since NMOS connects output to Gnd, the actual function implemented by NMOS devices in parallel in a CMOS circuit is a "NOR" function.



**Combinational Circuits** 

### **Construction of PUN**

- PMOS transistors are used in PUN of CMOS circuits.
   PMOS transistor is turned "ON" when Vin="0".
- PMOS devices connected in series implement "NOR" function.



□ PMOS devices in parallel implement "NAND" function.



© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

### **Dual PUN and PDN**

- □ PUN and PDN are dual networks
  - DeMorgan's theorems

 $\overline{A + B} = \overline{A \cdot B}$  [!(A + B) = !A • !B or !(A | B) = !A & !B]

 $\overline{\mathbf{A} \cdot \mathbf{B}} = \overline{\mathbf{A}} + \overline{\mathbf{B}}$  [!( $\mathbf{A} \cdot \mathbf{B}$ ) = ! $\mathbf{A}$  + ! $\mathbf{B}$  or !( $\mathbf{A} \& \mathbf{B}$ ) = ! $\mathbf{A}$  | ! $\mathbf{B}$ ]

- a parallel connection of transistors in the PUN corresponds to a series connection of the PDN
- Complementary gate is naturally inverting (NAND, NOR, AOI, OAI). If we want to implement a non-inverting function (e.g. AND, OR), we should cascade CMOS gate with an inverter.
- Number of transistors for an N-input logic gate is 2N

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### **Complementary CMOS Logic Style**

• PUP is the <u>DUAL</u> of PDN

(can be shown using DeMorgan's Theorem's)

 $\overline{A+B} = \overline{A}\overline{B}$  $\overline{AB} = \overline{A} + \overline{B}$ 

• The complementary gate is inverting



AND = NAND + INV

```
© Digital Integrated Circuits<sup>2nd</sup>
```

10 Combinational Circuits

### Example Gate: NAND



### **CMOS NOR Gate**



#### Truth Table for NOR Gate

| A | В | F |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 0 |

### **Constructing a Complex Gate**

- Constructing a complex CMOS gate:
- First construct PDN according to the non-inverting function.
   "•" : NMOS in serial, "+": NMOS in parallel.
- Then construct PUN according to DeMorgan theorem:
  - If two transistors are in series in NMOS, they will be in parallel in PMOS.



### **XNOR/XOR Implementation**



### **CMOS Properties**

- $\square$  Full rail-to-rail swing; (V\_{OH}=V\_{DD}, V\_{OL}=Gnd). high noise margins
- Logic levels not dependent upon the relative device sizes; ratioless
- Always a path to Vdd or Gnd in steady state; low output impedance
- Extremely high input resistance; nearly zero steadystate input current
- No direct path steady state between power and ground; no static power dissipation
- Propagation delay function of load capacitance and resistance of transistors
- Comparable output rise and fall times (under appropriate sizing conditions)

15 Combinational Circuits

### CMOS Static Property: VTC is Data-Dependent



- VTC depends on data input patterns.
- NAND gate: 3 possible patterns can switch output from high to low: (a). A=B=0→1; (b). A=1, B=0→1; (c). B=1, A=0→1. Switching threshold: V<sub>M</sub>(a)>V<sub>M</sub>(c)>V<sub>M</sub>(b). Why?
- In (a), both PMOS are ON for A=B=0, representing a strong pull-up. Thus it needs higher V<sub>M</sub> to pull output down.
- For (b) and (c), V<sub>tn</sub>(M2)>V<sub>tn</sub>(M1) due to body effect. Needs higher V<sub>A</sub> to switch M2 off.

### VTC is Data-Dependent



#### The threshold voltage of M<sub>2</sub> is higher than M<sub>1</sub> due to the body effect (γ)

 $V_{Tn1} = V_{Tn0}$ 

 $V_{Tn2} = V_{Tn0} + \gamma (\sqrt{|2\phi_F|} + V_{int}) - \sqrt{|2\phi_F|})$ 

since  $V_{SB}$  of  $M_2$  is not zero (when  $V_B = 0$ ) due to the presence of Cint © Digital Integrated Circuits<sup>2nd</sup> Combinational Circuits

## CMOS Dynamic Property: Review: CMOS Inverter: Dynamic



```
© Digital Integrated Circuits<sup>2nd</sup>
```

Combinational Circuits

#### **Review:** Designing Inverters for Performance

Reduce C<sub>L</sub>

- internal diffusion capacitance of the gate itself
- interconnect capacitance
- fanout
- □ Increase W/L ratio of the transistor
  - the most powerful and effective performance optimization tool in the hands of the designer
  - watch out for self-loading!
- □ Increase V<sub>DD</sub>
  - only minimal improvement in performance at the cost of increased energy dissipation
- Slope engineering keeping signal rise and fall times smaller than or equal to the gate propagation delays and of approximately equal values
  - good for performance
  - good for power consumption

**Combinational Circuits** 

## Switch Delay Model



### Input Pattern Effects on Delay



### **Delay Dependence on Input Patterns**

- □ A=B=1→0:  $V_{dd}$  charges  $V_{out}$  via both PMOS in parallel: faster.
- □ A=1→0, B=1:  $V_{dd}$  needs to charge C<sub>1</sub> via 1 PMOS transistor: medium.
- □ A=1, B=1→0:  $V_{dd}$  needs to charge both C<sub>L</sub> and C<sub>int</sub> via 1 PMOS transistor: slow.

2-input NAND with  $NMOS = 0.5 \mu m / 0.25 \mu m$  $PMOS = 0.75 \mu m / 0.25 \mu m$ 

**Combinational Circuits** 



## Transistor Sizing for Performance

- Transistor sizing: Adjust the size (generally the width W) of each individual transistor.
- □ Fact: Transistor on-resistance R<sub>eq</sub> (hence the RC delay) is reversely proportional to transistor size (W/L). Thus if we want  $\mathsf{R}_{\mathsf{eq}}$  to be reduced to half, we should increase the transistor width (W) to two times (2W).
- Transistor sizing: generally transistor length L is fixed, only width W is changed.

Reason: To ensure fast speed, the length of all the transistors are generally set to minimum value  $(2\lambda)$  and cannot be further reduced. As a result, to further improve the speed, we should enlarge transistor width (W).

- Sometimes to synchronize the signal flow to reduce glitches. we may need to match the propagation delay  $(t_{nLH}, t_{nHL})$  to another gate (e.g. inverter, etc.)
- Delay of CMOS gate is pattern-dependent. For propagation delay, generally we consider worst-case scenario.

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

# Transistor Sizing: NAND Gate Determin the sizes of transistors of NAND gate such that it has

- approximately the same  $t_{plb}$  and  $t_{phl}$  (worst-case scenario) as minimum size inverter: ( $\dot{W_p}/L_p=9\dot{\lambda}/2\lambda$ ,  $W_n/L_n=3\lambda/2\lambda$ ).
- □ Solution: 1). For t<sub>olh</sub>, worst case scenario: Vdd is charging F only via 1 PMOS transistor (either one). Thus:



### Transistor Sizing: NOR Gate

- □ Determin the sizes of transistors of NOR gate such that it has approximately the same  $t_{plh}$  and  $t_{phl}$  (worst-case scenario) as minimum size inverter: ( $W_p/L_p=9\lambda/2\lambda$ ,  $W_n/L_n=3\lambda/2\lambda$ ).
- Solution: 1). For t<sub>plh</sub>, Vdd has to charge F via 2 PMOS transistors in series. Thus:



### **Transistor Sizing a Complex CMOS Gate**

Solution: 1). For t<sub>pLH</sub>, worst case: Out is charged either via PMOS transistors A-D or B-C-D.



### **Transistor Sizing a Complex CMOS Gate**

□ Determin the sizes of transistors of following gate such that it has approximately the same  $t_{plh}$  and  $t_{phl}$  (worst-case scenario) as minimum size inverter:  $(W_p/L_p=9\lambda/2\lambda, W_n/L_n=3\lambda/2\lambda)$ .



### Transistor Sizing a Complex CMOS Gate



### **Fan-In Considerations**

- For gate with large fan-ins: internal node capacitance become significant.
- Consider tpHL, Gnd (0V) needs to propagate to output via four NMOS transistors.



### T<sub>p</sub> a Function of Fan-In

#### CMOS NAND gate:

 $\checkmark$  t<sub>pLH</sub>: increases linearly with fan-in (N).

Reason: For NAND gate, PMOS transistors are in parallel. Cint at output increases linearly with N,  $\rightarrow t_{pLH}$  increase linearly with N,

 $\checkmark$  t<sub>pHL</sub>: increase quadratically with fan-in (N).

Reason: See Elmore delay model of distributed RC network. NMOS transistors in NAND gate are connected in series.

Gates with a fan-in greater than 4 should be avoided.



## *t<sub>p</sub>* as a Function of Fan-Out



All gates have the same drive current.

Slope is a function of "driving strength"

31

**Combinational Circuits** 

## t<sub>p</sub> as a Function of Fan-In and Fan-Out

- Fan-in: quadratic due to increasing resistance and capacitance
- □ Fan-out: each additional fan-out gate adds two gate capacitances to C<sub>L</sub>

 $t_{p} = a_{1}FI + a_{2}FI^{2} + a_{3}FO$ 

#### Fast Complex Gates: Design Technique 1

- Transistor sizing
  - as long as fan-out capacitance dominates
- Progressive sizing: If N transistors are connected in series, according to distributed RC line delay model, R(M1) appears N times in delay equation, R(M2) appears (N-1) times in delay equation...  $\rightarrow$  R(M1) is most important, should be sized the largest to reduce dominant resistance.
- Progressive sizing: Graduately increase transistor size along transistors in series when moving away from output.



© Digital Integrated Circuits<sup>2nd</sup>



(the fet closest to the output should be the smallest) Can reduce delay by more than 20%; decreasing gains as technology shrinks

**Combinational Circuits** 

#### Fast Complex Gates: Design Technique 2

#### Input re-ordering

- when not all inputs arrive at the same time, the input become stable the last is called critical input. The path through the logic which determine the ultimate speed of the circuit is called critical path.
- Putting critical-path transistors closer to gate output can speed up the circuit.





discharge  $C_1$ ,  $C_1$  and  $C_2$ : slow



delay determined by time to discharge C<sub>1</sub> only. (C1, C2 are previously already discharged.): fast

© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

## Sizing and Ordering Effects



 $\pm C_{l} = 100 \, fF$ 

Progressive sizing in pull-down chain gives up to a 23% *improvement.* 

Input ordering saves 5% critical path A – 23% critical path D – 17%

#### Fast Complex Gates: Design Technique 3

□ Logic Restructuring: Manipulating logi equations can reduce fan-in requirements and thus reduce the gate delay. Alternative logic structures

F = ABCDEFGH





Alternative logic structure: less

fan-in, faster © Digital Integrated Circuits<sup>2nd</sup>





Alternative logic structure: less fan-in, faster

**Combinational Circuits** 

**Combinational Circuits** 

#### Fast Complex Gates: Design Technique 4

Isolating fan-in from fan-out using buffer insertion



Real lesson is that optimizing the propagation delay of a gate in isolation is misguided.

#### Fast Complex Gates: Design Technique 5: Sizing Logic Path for Speed

- Frequently, input capacitance of a logic path is constrained
- Logic also has to drive some capacitance
- Example: ALU load in an Intel's microprocessor is 0.5pF
- How do we size the ALU datapath to achieve maximum speed?
- We have already solved this for the inverter chain – can we generalize it for any type of logic?

| © Digital Integrated | Circuits <sup>2nd</sup> |
|----------------------|-------------------------|
|----------------------|-------------------------|

**Combinational Circuits** 

38 Combinational Circuit

### **Transistor Sizing and Gate Sizing**

© Digital Integrated Circuits<sup>2nd</sup>

- Transistor/gate sizing: generally transistor length L is fixed, only width W is changed.
- Transistor sizing: Adjust the size (generally the width W) of each individual transistor.
- Gate sizing: Enlarge or shrink the size (generally the width W) of all the PMOS and NMOS transistors in a gate simultaneously by factor S. That is,
- ✓ Original gate (size "1"): PMOS: (W<sub>pi</sub>/L<sub>pi</sub>), NMOS: (W<sub>ni</sub>/L<sub>ni</sub>),
- ✓ The same gate with size S: PMOS:  $(W_{pi}/L_{pi})$ → $(S \cdot W_{pi}/L_{pi})$ , NMOS:  $(W_{ni}/L_{ni})$ → $(S \cdot W_{ni}/L_{ni})$ .
- Sizing a gate from size "1" to size "S" does not change the size ratio between PMOS and NMOS transistors, hence it does not change the (t<sub>pHL</sub>/t<sub>pLH</sub>) ratio of the gate.



## **Buffer Example**

□ The delay of an inverter chain can be minimized by proper sizing:



For given *N*:  $C_{i+1}/C_i = C_i/C_{i-1}$ ,  $f=(F)^{1/N}$ . To find *N*:  $f=C_{i+1}/C_i \sim 4$ , N=ln(F)/ln(f).  $\Box$  How to generalize this strategy to any logic path?



#### Sizing the Gates base on Logical Effort

- $\square~$  The optimum fan-out for a chain of N inverters driving a load  $C_L$  is  $f = (C_L/C_{in})^{1/N}$ 
  - so, if we can, keep the fan-out per stage around 4 (f<sub>opt</sub>=4 for x=1).
- □ Can the same approach (logical effort) be used for any combinational circuit?
  - For a complex gate, we expand the inverter equation  $t_p = t_{p0} (1 + C_{ext} / \gamma C_g) = t_{p0} (1 + f/\gamma)$

to

 $t_p = t_{p0} (p + g f/\gamma)$ 

- $-t_{p0}$  is the intrinsic delay of an inverter
- f is the effective fan-out  $(C_{ext}/C_q)$  also called electrical effort
- p is the ratio of the instrinsic (unloaded) delay of the complex gate and a simple inverter (a function of the gate topology and layout style). It reflects the fact that the intrinsic delay of a complex CMOS gate is larger than that of an inverter.
- g is logical effort. It reflects the fact that a complex gate causes more delay than an inverter when it is used as fanout (load).

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

#### Intrinsic Delay Term, p

The more involved the structure of the complex gate, the higher the intrinsic delay compared to an inverter

| Gate Type    | р                  |
|--------------|--------------------|
| Inverter     | 1                  |
| n-input NAND | n                  |
| n-input NOR  | n                  |
| n-way mux    | 2n                 |
| XOR, XNOR    | n 2 <sup>n-1</sup> |

Ignoring second order effects such as internal node capacitances

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### Logical Effort Term, g

- g represents the fact that, for a given load, complex gates have to work harder than an inverter to produce a similar (speed) response
  - the logical effort of a gate tells how much worse it is at producing an output current than an inverter (how much more input capacitance a gate presents to deliver it same output current)

| Gate Type | g (for 1 to n input gates) |     |     |          |
|-----------|----------------------------|-----|-----|----------|
|           | 1                          | 2   | 3   | n        |
| Inverter  | 1                          |     |     |          |
| NAND      |                            | 4/3 | 5/3 | (n+2)/3  |
| NOR       |                            | 5/3 | 7/3 | (2n+1)/3 |
| mux       |                            | 2   | 2   | 2        |
| XOR       |                            | 4   | 12  |          |

### **Example – 8-input AND**



## Logical Effort

- Inverter has the smallest logical effort and intrinsic delay of all static CMOS gates
- Logical effort of a gate presents the ratio of its input capacitance to the input capacitance of an inverter when the gate is sized to deliver the same output current
- Logical effort increases with the gate complexity

### **Example of Logical Effort**

- Assuming a pmos/nmos ratio of 2, the input capacitance of a minimum-sized inverter is three times the gate capacitance of a minimum-sized nmos (C<sub>unit</sub>)
- Logical effort is the ratio of input capacitance of a gate to the input capacitance of an inverter with the same output current.



### Delay in a Logic Gate



#### © Digital Integrated Circuits<sup>2nd</sup>

© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

45

**Combinational Circuits** 

### **Delay as a Function of Fan-Out**



- The slope of the line is the logical effort of the gate
- The y-axis intercept is the intrinsic delay
- Can adjust the delay by adjusting the effective fan-out (by sizing) or by choosing a gate with a different logical effort

Gate effort: h = fg

### **Add Branching Effort**

#### Branching effort:



© Digital Integrated Circuits<sup>2nd</sup>

### Multistage Networks

$$Delay = \sum_{i=1}^{N} (p_i + g_i \cdot f_i)$$

Stage effort (or gate effort):  $h_i = g_i f_i$ Path electrical effort:  $F = C_{out}/C_{in}$ Path logical effort:  $G = g_1 g_2 \dots g_N$ Branching effort:  $B = b_1 b_2 \dots b_N$ Path effort: H = GFB

Path delay  $D = \Sigma d_i = \Sigma p_i + \Sigma h_i$ 

© Digital Integrated Circuits<sup>2nd</sup>

50 Combinational Circuits

### **Optimum Effort per Stage**

□To minimize the total delay through the path, each stage should bear the same effort:

 $h^{N} = H$   $h = \sqrt[N]{H}$ Stage efforts:  $g_{1}f_{1} = g_{2}f_{2} = \dots = g_{N}f_{N} = h$ Effective fanout of each stage:  $f_{i} = h/g_{i}$ Minimum path delay (Generally *s*=1)  $D = t_{p0} \left( \sum_{j=1}^{N} p_{j} + \frac{N\sqrt[N]{H}}{\gamma} \right) = t_{p0} (P + NH^{1/N} / \gamma)$ 

© Digital Integrated Circuits<sup>2nd</sup>

51 Combinational Circuits

49

**Combinational Circuits** 

### Path Delay of Complex Logic Gate Network

□ Total path delay through a combinational logic block

 $t_p = \sum t_{p,j} = t_{p0} \sum (p_j + (f_j g_j)/\gamma)$ So, the minimum delay through the path determines that each stage should bear the same gate effort

 $f_1g_1 = f_2g_2 = \ldots = f_Ng_N = h = (H)^{1/N}$ 

□ Consider optimizing the delay through the logic network



how do we determine a, b, and c sizes?

#### Steps for Path Delay Optimization using Logical Efforts

- **D** The path logical effort,  $G = \prod g_i$
- □ And the path effective fan-out (path electrical effort) is  $F = C_L/g_1$
- The branching effort accounts for fan-out to other gates in the network
- $b = (C_{on-path} + C_{off-path})/C_{on-path}$
- **•** The path branching effort is then  $B = \prod b_i$
- □ And the total path effort is then H = GFB
- Optimum gate effort for each stage:

$$h = (H)^{1/N} = f_1 g_1 = f_2 g_2 = \dots = f_N g_N$$

- $\Box$  Effective fanout (electrical effort) of each stage:  $f_i = h/g_i$
- □ Optimum size of each gate:

gate:  $s_i = \left(\frac{g_1 s_1}{g_i}\right) \prod_{j=1}^{i-1} \left(\frac{f_j}{b_j}\right)$ 

□ So, the minimum delay through the path is  $D = t_{p0} \left( \sum p_j + (N \cdot H^{1/N}) / \gamma \right)$ 

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

# Example: Path Delay Optimization using Logical Efforts



#### Example: Path Delay Optimization using Logical Efforts

□ For gate i in the chain, its size is determined by



- □ Following the steps as discussed before:
  - $F = C_L / C_{g1} = 5$
  - $G = g_1 g_2 g_3 g_4 = 1 \times 5/3 \times 5/3 \times 1 = 25/9$
  - $B = b_1 b_2 b_3 b_4 = 1 \times 1 \times 1 \times 1 = 1$  (no branching)
  - H = GFB = 125/9, so the optimal stage effort is  $h=(H)^{1/N}=(H)^{1/4}= 1.93$ 
    - Fan-out factors are f<sub>1</sub>=h/g1=1.93/1=1.93, f<sub>2</sub>=h/g<sub>2</sub>=1.93/(5/3)=1.16, f<sub>3</sub> =h/g<sub>3</sub>=1.93/(5/3)=1.16, f<sub>4</sub>=(h/g<sub>4</sub>)=1.93/1=1.93
  - So the gate sizes are: s<sub>1</sub>=1, s<sub>2</sub>=a=(g<sub>1</sub>s<sub>1</sub>f<sub>1</sub>)/(g<sub>2</sub>b<sub>1</sub>)=1.16, s<sub>3</sub>=b=(g<sub>1</sub>s<sub>1</sub>/g<sub>3</sub>)·[(f<sub>1</sub>f<sub>2</sub>)/(b<sub>1</sub>b<sub>2</sub>)]=1.34 s<sub>4</sub>=c=(g<sub>1</sub>s<sub>1</sub>/g<sub>4</sub>)·[(f<sub>1</sub>f<sub>2</sub>f<sub>3</sub>)/(b<sub>1</sub>b<sub>2</sub>b<sub>3</sub>)]=2.60

#### © Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

#### Example: Path Delay Optimization using Logical Efforts



G = 25/9 H = 125/9 = 13.9 h = 1.93  $a = (g_1s_1f_1)/(g_2b_1)=1.16$   $b = (g_1s_1/g_3) \cdot [(f_1f_2)/(b_1b_2)] = 1.34$  $c = (g_1s_1/g_4) \cdot [(f_1f_2f_3)/(b_1b_2b_3)] = 2.60$ 

### Summary

#### Table 4: Key Definitions of Logical Effort

| Term              | Stage expression             | Path expression                            |
|-------------------|------------------------------|--------------------------------------------|
| Logical effort    | $m{g}$ (seeTable 1)          | $G = \prod g_i$                            |
| Electrical effort | $h = \frac{C_{out}}{C_{in}}$ | $H = \frac{C_{out (path)}}{C_{in (path)}}$ |
| Branching effort  | n/a                          | $B = \prod b_i$                            |
| Effort            | f = gh                       | F = GBH                                    |
| Effort delay      | f                            | $D_F = \sum f_i$                           |
| Number of stages  | 1                            | N                                          |
| Parasitic delay   | <i>p</i> (seeTable 2)        | $P = \sum p_i$                             |
| Delay             | d = f + p                    | $D = D_F + P$                              |

© Digital Integrated Circuits<sup>2nd</sup>

57 Combinational Circuits

59

**Combinational Circuits** 

Sutherland.

Sproull Harris

#### Fast Complex Gates: Design Technique 6

Reducing the voltage swing

$$t_{pHL} = 0.69 (3/4 (C_L V_{DD}) / I_{DSATn})$$

$$= 0.69 (3/4 (C_L V_{swing}) / I_{DSATn})$$

- linear reduction in delay
- also reduces power consumption
- requires use of "sense amplifiers" on the receiving end to restore the signal level (will look at their design when covering memory design)

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits



### **Ratioed Logic**

### **Ratioed Logic**

- Purpose: to reduce number of transistors required to implement a given logic function, often at the cost of reduced robustness and extra power.
- Implementation: Replace PUN with a single unconditional load device.



Goal: to reduce the number of devices over complementary CMOS

### **Ratioed Logic with Resistive Load**

- Ratioed logic with resistive load:  $V_{OH}=V_{DD}$  (good), but  $V_{OL}=V_{DD}\cdot R_{PDN}/(R_{PDN}+R_L) \neq 0$ .
- In order for V<sub>OL</sub>→0, we should set: R<sub>L</sub>>>R<sub>PDN</sub>. That is, output voltage swing and functionality of the gate depends on size ratio of PDN and R<sub>L</sub> → ratioed logic. (Static CMOS: ratioless)



## **Ratioed Logic with Active Loads**

Depletion load NMOS: use depletion mode NMOS transistor as load.

✓ Depletion mode NMOS: a conductive channel already exists when

 $V_{GS}$ =0 (normally "ON"). In order to turn it off, a negative  $V_{GS}$  should be applied (i.e.  $V_{Tn}$ <0). It is commonly used as resistor instead of switch.

✓ Question: Can we use a NMOS load with Gate tied to Vdd? Why?



### **Pseudo-NMOS Inverter: VTC**



### **Pseudo-NMOS Inverter VTC**

□ Pseudo-NMOS inverter VTC:

 $\checkmark$  As (W/L)\_p increases, V\_{OL} increases, eventually it may cause the inverter to malfunction  $\rightarrow$  ratioed logic.



#### Pseudo-NMOS Gates: 4-input NOR/NAND

□ Question: Design 4-input pseudo-NMOS NOR gate and NAND gate?



#### Pseudo-NMOS Gates: 4-input NOR/NAND

Question: Design 4-input pseudo-NMOS NOR gate and NAND gate?

Question: For circuit implementation in pseudo-NMOS, given the choice between NOR and NAND logic, which one would you prefer? Why? (Hint: Consider from performance perspective)



### Improved Loads (1): Adaptive Load

 $\square$  In order for  $V_{OL}{\rightarrow}0,\,M_2$  should be small (large  $R_{on})$ , this increases  $t_{pLH}$  delay  $\rightarrow$  slow speed.

 $\Box$  Solution: Add a wide PMOS enable transistor M<sub>1</sub> (W(M<sub>1</sub>)>>W(M<sub>2</sub>)), used in address decoder of memory.

✓ In standby mode, Enable=0, Enable'=1, M1 is off: will not affect the circuit,

✓ When address change is detected, Enable=0→1, Enable'=1→0, M1 is turned ON to quickly charge output to 1: reduced  $t_{pLH}$  delay.



### Improved Loads (2): DCVSL

Differential Cascode Voltage Switch Logic (DCVSL): eliminates static currents and provides rail-to-rail voltage swing.

- ✓ Each input is provided in complementary format, it produces complementary outputs in turn.
- ✓ Feedback mechanism ensures load device is turned off when not needed.  $V_{DD}$   $V_{DD}$

✓ Pull-down networks PDN1 and PDN2 are M1mutually exclusive: When Out O Out PDN1 is ON. PDN2 is off: when PDN1 is A O PDN1 PDN2 B 0-OFF, PDN2 is В ON. VSS .  $V_{SS}$ 

© Digital Integrated Circuits<sup>2nd</sup>

Differential Cascode Voltage Switch Logic (DCVSL) Combinational Circuits

### **DCVS Logic: Working Principle**

DCVSL working principle: For example, if initially PDN1 is off, PDN2 is on, Out=1, Out'=0, M1: on, M2: off.



PDN1 and PDN2 are mutually exclusive

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### **DCVS Logic: Working Principle**

□ If input pattern changes so that PDN1: off→on, PDN2: on→off. Now Out' is floating, M1 and PDN1 are both on. PDN1 must be strong enough to bring "Out" below  $V_{DD}$ - $|V_{Tp}|$ , so that M2 switches off→on and starts charging Out' to  $V_{DD}$  (0→1). This further turns M1 on→off, so that "Out" can be fully pulled-down to Gnd (1→0).



PDN1 and PDN2 are mutually exclusive

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### **DCVSL Example**

□ DCVSL XOR-XNOR gate (It's possible to share transistors among two pull-down networks to reduce circuit area).

Question: Out=? Out'=?



71 Combinational Circuits

### **DCVSL Transient Response**

□ DCVSL AND/NAND gate: Out=? Out'=?
 □ Transient response: at t=0.2ns, AB=00→11.
 □ Advantage of DCVSL: Both Out and Out' are obtained simultaneously. (In static CMOS, Out is generated from Out' using an inverter: extra inverter delay between Out' and Out)



| Image: Sector of the sector | $\overline{B}$<br>F = AB<br>F = AB<br>F = AB<br>AB=01: M1 on, M2 off, F=A=0.<br>$\checkmark AB=10: M1 off, M2 on, F=0$<br>$\checkmark AB=11: M1 on, M2 off, F=A=1.$<br>$\rightarrow \text{ correct function of AND gate.}$<br>A B=11: M1 on, M2 off, F=A=1.<br>$\rightarrow \text{ correct function of AND gate.}$<br>A B=01: M1 off, M2 off, F=A=0.<br>$\Rightarrow \text{ correct function of AND gate.}$<br>A B=01: M1 off, M2 off, F=A=0.<br>$\Rightarrow \text{ correct function of AND gate.}$<br>A B=01: M1 off, M2 off, F=A=0. |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 73                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | rated Circuits <sup>2nd</sup> Combinational Circuits                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |

### **Pass-Transistor Logic**

© Digital Integrated Circuits<sup>2nd</sup>



#### NMOS Transistors in Series/Parallel

- Primary inputs drive both gate and source/drain terminals
- □ NMOS switch closes when the gate input is high



Remember - NMOS transistors pass a strong 0 but a weak 1

75

**Combinational Circuits** 

#### **PMOS Transistors in Series/Parallel**

- Primary inputs drive both gate and source/drain terminals
- □ PMOS switch closes when the gate input is low



Remember - PMOS transistors pass a strong 1 but a weak 0

```
© Digital Integrated Circuits<sup>2nd</sup>
```

Combinational Circuits

### **Pass Transistor (PT) Logic**



© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### Voltage Swing for Pass-Transistor Circuits

Example: NMOS Only PT Driving an Inverter



 $\Box$  V<sub>x</sub> does not pull up to V<sub>DD</sub>, but V<sub>DD</sub> – V<sub>Tn</sub>

- Threshold voltage drop causes static power consumption (M<sub>1</sub> is turned on by V<sub>x</sub>, but V<sub>x</sub> may not be high enough to turn off M<sub>2</sub>. M<sub>2</sub> may be weakly conducting forming a path from V<sub>DD</sub> to GND via M<sub>2</sub>, M<sub>1</sub>)
- Notice V<sub>Tn</sub> increases of pass transistor due to body effect (V<sub>SB</sub>) (Bulk of NMOS is connected to Gnd, V<sub>SB</sub>(M<sub>3</sub>)=V<sub>x</sub>≠0).

© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

### Voltage Swing of PT Driving an Inverter

 $\square$  Due to threshold drop of M<sub>3</sub>, V<sub>x</sub> goes up at most to 1.8V, instead of V<sub>DD</sub> (2.5V).  $$_3_{\,\rm T}$$ 



### Cascaded NMOS Only PTs





Proper way of cascading pass gates: Swing on  $y = V_{DD}$ -  $V_{Tn1}$  (only 1 threshold drop)

- Pass-transistor gates cannot be cascaded by connecting the output of a pass gate to the gate input of another pass-transistor (see left).
- Logic on the right suffers from static power dissipation and reduced noise margins

© Digital Integrated Circuits<sup>2nd</sup>

- V<sub>Tn1</sub> - V<sub>Tn2</sub>

**Combinational Circuits** 

### VTC of PT AND Gate

- VTC of pass-transistor gate: data dependent:
- ✓  $V_B = V_{DD}$ ,  $V_A = 0 \rightarrow V_{DD}$ : M1 on, M2 off, F follows VA till M1 is turned off  $(V_A = V_{DD} - V_{Tn}).$
- ✓  $V_A = V_{DD}$ ,  $V_B = 0 \rightarrow V_{DD}$ : Since inverter  $V_M = V_{DD}/2$ , when  $V_{in} < V_{DD}/2$ ,  $M_2$ remains on and  $V_{F} \approx 0$ . When  $V_{in} > V_{DD}/2$ ,  $M_2$  is off,  $V_F$  follows  $V_B$ minus a threshold drop ( $V_{B}$ - $V_{Tn}$ ).
- Pure PT logic is not regenerative - the signal gradually degrades after passing through a number of PTs (solution: occasional insertion of static CMOS inverter)



### Differential PT Logic (CPL)



### Differential PT Logic (CPL)



### **CPL** Properties

- Differential so complementary data inputs and outputs are always available (so don't need extra inverters)
- □ Still static, since the output defining nodes are always tied to V<sub>DD</sub> or GND through a low resistance path
- Design is modular; all gates use the same topology, only the inputs are permuted.
- □ Simple XOR makes it attractive for structures like adders
- □ Fast (assuming number of transistors in series is small)
- Additional routing overhead for complementary signals

**Combinational Circuits** 

□ Still have static power dissipation problems

#### Differential TG Logic (DPL): Another Design



### **CPL Full Adder**

© Digital Integrated Circuits<sup>2nd</sup>



#### **Threshold Drop in PTL - Solution 1: Level Restorer**

- Threshold drop in PTL leads to reduced voltage swing, reduced noise margin and static power dissipation.
- Solution 1: use level restorer (a single feedback PMOS)
- Full swing on x (due to Level Restorer) so no static power consumption by inverter
- No static backward current path through Level Restorer and PT since Restorer is only active when A is high (If A=1=V<sub>DD</sub>, no current flows from V<sub>DD</sub> to A; If A=0, M<sub>r</sub> is off, again no current flowing from V<sub>DD</sub> to A).



#### **Threshold Drop in PTL - Solution 1: Level Restorer**

- $\hfill\square$  For correct operation  $M_r$  must be sized correctly (ratioed)
- ✓ When A=1→0, B=1, M<sub>n</sub> tries to pull down X, while level restorer M<sub>r</sub> pulls X to V<sub>DD</sub>. M<sub>n</sub> and M<sub>r</sub> should be properly sized so that X drops below inverter threshold V<sub>M</sub>, hence Out changes 0→1, Mr is turned off and X is fully pulled down by A.
- ✓ Transistor size of Mr: (W/L)<sub>r</sub> should be small, so that resistance R(M<sub>r</sub>) is large to ensure X can be pulled down while M<sub>r</sub> and M<sub>n</sub> are simultaneously ON. Otherwise, V<sub>x</sub> may never be able to switch the inverter, and gate is locked in a single state.



### **Restorer Sizing: Simulation Result**

Simulation result: For (W/L)<sub>r</sub>>1.5µm/0.25µm, node X cannot be brought below V<sub>M</sub> of inverter, and can't switch the output.



•Upper limit on restorer size •Pass-transistor pull-down can have several transistors in stack

- node x never goes below V<sub>M</sub> of inverter so output never switches
- Restorer has speed and power impacts: increases the capacitance at x, slowing down the gate; increases t<sub>r</sub> (but decreases t<sub>r</sub>) at "Out".
   <sup>90</sup>
   © Digital Integrated Circuits<sup>2nd</sup>

### Solution 2: Multiple V<sub>T</sub> Transistors

□ Technology solution: Use (near) zero V<sub>T</sub> devices for the NMOS PTs to eliminate *most* of the threshold drop (body effect still in force preventing full swing to V<sub>DD</sub>). All the devices other than pass transistors still use standard high V<sub>T</sub> devices.



### Solution 3: Transmission Gate (TG) Logic





### Transmission Gate XOR

- Transmissoin-gate based XOR circuit: 6 transistors (including inverter used for B')
- Static CMOS XOR: 12 transistors (4 transistors for inverters used to generate A', B').

R

- Working principle:
- ✓ When B=1, B'=0, M2 and M1 act as inverter, M3 and M4 are off, F=A'B.
- ✓ When B=0, B'=1, M2 A and M1 are disabled, TG (M3/M4) is on, F=AB'.

✓ Thus: F=A'B+AB'=A XOR B

© Digital Integrated Circuits<sup>2nd</sup>

**Transmission Gate XOR** 

Another drawing of transmission gate based XOR circuit:



#### Transmission Gate Full Adder

 Transmission-gate based full adder circuit: 24 transistors (including inverters used to generated inverted inputs)



#### Similar delays for sum and carry

**Combinational Circuits** 

M3/M4

94

**Combinational Circuits** 

### TG Full Adder

 Transmission-gate based full adder circuit: another drawing







### **TG Delay Optimization**

□ Can speed it up by inserting buffers every M switches



 Delay of buffered chain (M TG's between buffer) t<sub>p</sub> = 0.69 / N/M CR<sub>eq</sub> (M(M+1))/2/+ (N/M - 1) t<sub>pbuf</sub>
 - linear dependence on no. of TGs (N). (without buffer insertion: quadratic dependence (N<sup>2</sup>) → slow)
 Optimal no. of TGs between buffers: M<sub>opt</sub> = 1.7 √(t<sub>pbuf</sub>/CR<sub>eg</sub>) ≈ 3 or 4 (typically)



## **Dynamic Logic**

© Digital Integrated Circuits<sup>2nd</sup>

101 Combinational Circuits

### **Dynamic CMOS**

- In static circuits at every point in time (except when switching) the output is connected to either GND or V<sub>DD</sub> via a low resistance path.
  - fan-in of N requires 2N devices (N NMOS + N PMOS)
- Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes.
  - requires only N + 2 transistors (N+1 NMOS + 1 PMOS)
  - takes a sequence of precharge and conditional evaluation phases to realize logic functions
  - no static power consumption (this is better than pseudo-NMOS)

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### Dynamic Gate: Φn Network

- Dynamic gate (Φn network): PDN + precharge transistor  $M_p$  + evaluation transistor  $M_e$ .
- Construction of Φn network: Starting from static CMOS circuit, replace PUN with a PMOS precharge transistor (M<sub>p</sub>), and insert a NMOS evaluation transistor (M<sub>e</sub>) between bottom of PDN and Gnd.



## **Dynamic Gate: Working Principle**

- Precharge phase: Clk=0, M<sub>p</sub> is on, M<sub>e</sub> is off, output is prechaged to V<sub>DD</sub> regardless input values (Out=1).
- Evaluation phase: Clk=1, M<sub>p</sub> is off, M<sub>e</sub> is on, output is conditionally discharged based on input values to PDN. For given input pattern,
- ✓ If PDN is off: Out=1, but "Out" is not connected to V<sub>dd</sub>. It's floating and relies on charge previously stored in capacitor C<sub>L</sub> to hold "Out" to be 1;
- ✓ If PDN is on: Out=0, "Out" is discharged to 0 via PDN.



### Dynamic Logic: Φη/Φρ Network

- $\Box$   $\Phi$ n network: PDN + precharge M<sub>p</sub> (PMOS) + evaluation M<sub>e</sub> (NMOS)
- $\checkmark$  Φ=0, precharge phase, Mp on, Me off, Out is pre-charged to 1: Out=1
- $\checkmark$   $\Phi$ =1, evaluation phase, Mp off, Me on, Out depends on PDN
- $\Box$   $\Phi$ p network: PUN + predischarge M<sub>p</sub> (NMOS) + evaluation M<sub>e</sub> (PMOS)
- $\checkmark$  Φ=1, predischarge phase, M<sub>p</sub> on, M<sub>e</sub> off, Out is predischarged to 0: Out=0
- $\checkmark$  Φ=0, evaluation phase,  $\rm M_p$  off,  $\rm M_e$  on, Out depends on PUN
  - If PUN is off, Out remains 0, but is floating,
- If PUN is on, Out is connected to  $V_{dd}$  via PUN: Out=1.
- $\checkmark$   $\Phi p$  network is less popular because  $\widetilde{P}MOS$  is slower than NMOS.



### **Dynamic Gate: Working Principle**

- Question: Implement 2-input NOR gate with static CMOS, Φn CMOS, Φp CMOS separately: Out=(A+B)? Rules:
- ✓ Φn network: Starting from static CMOS circuit, replace PUN with a PMOS precharge transistor (M<sub>p</sub>), and insert a NMOS evaluation transistor (M<sub>e</sub>) between bottom of PDN and Gnd.
- ✓ Φp network: Starting from static CMOS circuit, replace PDN with a NMOS predischarge transistor (M<sub>p</sub>), and insert a PMOS evaluation transistor (M<sub>e</sub>) between top of PUN and Vdd.



## **Dynamic Gate: Working Principle**



## **Dynamic Gate: Working Principle**

 Question: Compare static CMOS and Φp NOR2 gates, if AB=00→01→10→11, sketch waveforms of Out? (Within each clock cycle: precharge→evaluation)



### **Φn Network: Conditions on Output**

- Dynamic logic works in two phases: precharge and evaluation.
- $\checkmark\,$  In precharge phase, output is always 1 regardless input patterns.
- In evaluation phase, output is the valid function of inputs. If output=1, output is floating (high impedance state) and it relies on the charge previously stored in capacitor to hold output to be "1". (Instead, static CMOS always has a low resistance path between output and one of the power rails.)
- Once the output of a dynamic Φn gate is discharged, it cannot be charged again until the next precharge operation.
- □ Output can make at most one transition  $(1 \rightarrow 0)$  during evaluation.
- Output is valid only in evaluation phase. In precharge phase, output is always "1" and cannot be utilized.
- Digital system can be designed in such a way that precharge time coincides with other system functions. E.g. the precharge of arithmetic unit in a microprocessor could coincide with instruction decoding.

© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

### **Properties of Dynamic Gates**

- Logic function is implemented by the PDN only
  - number of transistors is N + 2 (versus 2N for static complementary CMOS)
  - should be smaller in area than static complementary CMOS
- **□** Full swing outputs ( $V_{OL}$  = GND and  $V_{OH}$  =  $V_{DD}$ )
- Nonratioed sizing of the devices is not important for proper functioning (sizing only affects performance)
- □ Faster switching speeds
  - reduced load capacitance due to lower number of transistors per gate (C<sub>int</sub>) so a reduced logical effort
  - reduced load capacitance due to smaller fan-out (C<sub>ext</sub>)
  - Ignoring the influence of precharge time on the switching speed of the gate, t<sub>pLH</sub> = 0 (why?) but the presence of the evaluation transistor slows down the t<sub>pHL</sub>
- no I<sub>sc</sub>, so all the current provided by PDN goes into discharging C<sub>L</sub>
   <sup>©</sup> Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### **Dynamic Behavior**

Dynamic behavior of dynamic 4-input NAND:

 $\checkmark$  PDN starts to work as soon as the input signals exceed V<sub>Tn</sub>, so set V<sub>M</sub>, V<sub>IH</sub> and V<sub>IL</sub> all equal to V<sub>Tn</sub> low noise margin (NM<sub>L</sub>)



### Power Consumption of Dynamic Gate

- Dynamic logic seems to consume less power because:
- Physical capacitance is lower (it uses fewer transistors)
- Load capacitance is small (load for each fanout is 1 instead of 2 transistors)
- ✓ No glitching power: Output at most have 1 transition per cycle.
- $\checkmark\,$  No short-circuit power:  $M_p$  is off when PDN is evaluating.



Combinational Circuit

### Power Consumption of Dynamic Gate

- However, dynamic logic generally consumes more power than static CMOS. Why?
- Its clock power can be significant: clock has a guaranteed transition on every clock cycle.
- Short-circuit power may exist when leakage-combating devices are added.
- $\checkmark$  It has higher switching activity ( $\alpha_{0,1}$ ) due to periodic precharge and discharge operation. (switching power:  $P_{sw} = \alpha_{0 \rightarrow 1} C_1 V_{dd}^2 f$ )



**Dynamic Power Consumption: Data Dependent** 

- $\Box$  0 $\rightarrow$ 1 switching probability of static CMOS gate:  $\alpha_{0}_{1}$ (static)= $p_0p_1=p_0(1-p_0)$
- □ Dynamic  $\Phi$ n gate: output makes 0→1 transition during precharge only if output was 0 in preceding evaluation phase. Thus  $\alpha_{0,1}$  only depends on signal probability  $p_0$ :

 $\alpha_{0 \rightarrow 1}$ (dynamic)= $p_0 \ge p_0(1-p_0) = \alpha_{0 \rightarrow 1}$ (static)

| <b>Ξ</b> Example: Static CMOS and Φn dynamic 2-input NOR Gates |  |
|----------------------------------------------------------------|--|
| Assume signal probabilities                                    |  |

| A | В | Out |
|---|---|-----|
| 0 | 0 | 1   |
| 0 | 1 | 0   |
| 1 | 0 | 0   |
| 1 | 1 | 0   |



Transition probability of  $\Phi$ n NOR2:

 $\alpha_{0 \rightarrow 1}$  (dynamic) =  $P_{out=0}$  = 3/4

Static CMOS NOR2 gate:

$$\alpha_{0\to 1}(\text{static}) = p_0 p_1 = (3/4) \times (1/4) = 3/16$$

Switching activity can be higher in dynamic gates!

Dynamic gates:  $\alpha_{0\rightarrow 1} = P_{out=0}$ 

□ Question: Find  $\alpha_{0 \rightarrow 1}$  for dynamic  $\Phi p$  NOR2 gate? ( $\alpha_{0 \rightarrow 1} = P_{out=1}$ ) © Digital Integrated Circuits<sup>2nd</sup> **Combinational Circuits** 

### Issues in Dynamic Design 1: Charge Leakage

Dynamic gate relies on dynamic storage of output value on a capacitor. In evaluation, if PDN is off, Out=1. It relies on charge previously stored in capacitor C<sub>1</sub> during precharge to maintain it at  $V_{DD}$ . However, due to leakage currents, the charge gradually leaks away, Vout drops gradually, eventually resulting in malfunctioning of the gate.  $\rightarrow$  clock period of dynamic logic should not be too long.



Dominant component is subthreshold current Dynamic logic requires minimum clock rate of a few kHz. © Digital Integrated Circuits<sup>2nd</sup> **Combinational Circuits** 

# Impact of Charge Leakage Output settles to an intermediate voltage determined by

a resistive divider of the pull-up and pull-down networks

 Once the output drops below the switching threshold of the fan-out logic gate, the output is interpreted as a low voltage. (Question: Does static CMOS have this



#### **Dynamic Logic: Charge Leakage Solution**

- Solution to charge leakage: add a bleeder transistor between Vdd and output.
- ✓ Using grounded PMOS as bleeder: ratioed, not good.
- Bleeder in feedback configuration: no static power dissipation.



#### **Issues in Dynamic Design 2: Charge Sharing**

□ Assume AB=00 in precharge, and  $C_a$  is discharged. Now in evaluation, B=0, but A=0→1, M<sub>a</sub> is on. Out should remain 1, but the charge stored originally on C<sub>L</sub> is redistributed over C<sub>L</sub> and C<sub>a</sub>. This causes a drop in V<sub>out</sub>, which cannot be recovered due to dynamic nature of circuit. → charge sharing!



Charge stored originally on  $C_{L}$  is redistributed (shared) over  $C_{L}$  and  $C_{a}$  leading to static power consumption by downstream gates and possible circuit malfunction.

When  $\Delta V_{out} = -V_{DD} (C_a / (C_a + C_L))$  the drop in  $V_{out}$  is large enough to be below the switching threshold of the gate it drives, this may cause a malfunction.

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### **Charge Sharing in Dynamic Gate**

 $\Box$  Due to charge sharing, what's the final value of V<sub>out</sub>?



### **Charge Sharing Example**

□ 3-input EXOR gate:  $y=A \oplus B \oplus C$ . What is the worst case voltage drop on y? (Assume all inputs are low during precharge and that all internal nodes are initially at 0V.) Since  $C_c > C_d$ ,  $C_a = C_b$ , worst cases: charge on  $C_y$ is shared with  $C_a + C_c$  or  $C_b + C_c$ . This happens when C' is off, but A,B' or A',B are on, i.e. for patterns ABC=011 or 101.



### Solution to Charge Redistribution

□ Solution to charge redistribution: Precharge critical internal nodes using a clock-driven transistor (at the cost of increased area and power)

□ Since internal nodes are charged to V<sub>DD</sub> during precharge, charge sharing does not occur.



#### Issues in Dynamic Design 3: Backgate Coupling

- Output of dynamic logic is susceptible to crosstalk due to 1) high impedance of the output node, 2) capacitive coupling.
- Backgate (output-to-input) coupling: In following circuit, Out2=(In·Out1)'. Out2 capacitively couples with Out1 through the gate-source and gate-drain capacitances of M4.
- ✓ If Out1=1, In=0→1, Out2=1→0, this output transition capacitively couples to Out1, Out1 drops significantly and cause M6 to be on.
  - $\rightarrow$  leakage current via M6-M4-M3, Out2 is not fully pulled down to 0V.



### **Backgate Coupling Effect**

Simulation result: Capacitive coupling means Out1 drops significantly so Out2 doesn't go all the way to ground



#### Issues in Dynamic Design 4: Clock Feedthrough

- Clock feedthrough: a special case of capacitive coupling between the clock input of the precharge transistor and the dynamic output node.
- When CLK=0→1, rising transition of CLK is capacitively coupled to Out and causes the output (dynamic node) to rise above VDD. Similarly, when CLK=1→0, falling transition of CLK is coupled to Out and causes it to temporarily fall below 0V.



Coupling between Out and CLK input of the precharge device due to the gate-drain capacitance. So voltage of Out can rise above  $V_{DD}$ . The fast rising (and falling edges) of the clock couple to Out.

## **Clock Feedthrough**

□ Clock feedthrough may cause normally reverse-biased junction diode to become forward-biased. → electron injection to substrate and collected by a nearby high-impedance node in "1" state → eventually may cause malfunction!



## **Cascading Dynamic Gates**

- Straightforward cascading of dynamic gates does not work!
- Ex: cascading 2 dynamic Φn inverters. When CLK=0→1 (evaluation), In=1, Out2=(In')' should remain 1. But Out1=1→0 with propagation delay. As long as Out1>V<sub>Tn</sub>, it turns on NMOS in next inverter, Out2 is mis-discharged till V<sub>out1</sub><V<sub>Tn</sub>. Once Out2 is mis-discharged, it cannot recover back to V<sub>DD</sub> till next precharge phase.



#### Cascading Dynamic Gates #1: Domino Logic

- Cascading problem is because outputs of each gate (thus the inputs to next stages) are precharged to 1. This may turn on NMOS transistors in PDN of next stage and cause inadvertent discharge in evaluation.
- Solution: 1). Domino logic: add inverter to set the inputs to next stage to 0 instead of 1 during precharge. → inverter outputs turn off the transistors in PDN of next stage after precharge. Transistors are turned on only when needed and at most, once per cycle. → no inadvertent discharging in evaluation.



### **Domino Logic**

- Domino logic: N-type dynamic logic block followed by inverter.
- In precharge, output of PDN is precharged to "1", Out1="0". NMOS transistor in next stage connected to Out1 is off.
- In evaluation, NMOS transistors in next Φn stage either remains off, or turn on as needed: No 1→0 transition, no inadvertent discharging!
- Inverter can also be used to drive a bleeder to combat leakage and charge redistribution.



## Why Called Domino?



Why called Domino? For chain of domino gates: In precharge, output of all stages are set to 0 simultaneously (parallel process). In evaluation, output of 1<sup>st</sup> domino block either stays at 0 or makes 0→1 transition, affecting the 2<sup>nd</sup> gate. This effect ripples through the whole chain one after the other, like a line of falling dominoes (serial process). – hence the name.



### **Properties of Domino Logic**

- Only non-inverting logic can be implemented, fixes include
  - can reorganize the logic using Boolean transformations
  - use differential logic (dual rail)
  - use np-CMOS (zipper)
- Very high speed
  - t<sub>pHL</sub> = 0 (only t<sub>pLH</sub> exists)
  - static inverter can be optimized to match fan-out (separation of fan-in and fan-out capacitances)
  - Input capacitance reduced smaller logical effort

© Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### **Designing with Domino Logic**



### **Footless Domino**

- Footless Domino: Domino logic with foot transistor (evaluation transistor M<sub>e</sub>) removed.
- However, removing M<sub>e</sub> extends the precharge cycle precharge now has to ripple through the logic network as well.
- Ex: Footless Domino inverter chain, in evaluation, In<sub>1</sub>=1, then Out<sub>i</sub>=0, In<sub>i</sub>=1. When clk=1→0, In<sub>1</sub>=1, In<sub>2</sub>=1→0 after 2 gate delays, before that, Out<sub>2</sub> cannot be precharged because In<sub>2</sub> is still on. Similarly, 3<sup>rd</sup> gate has to wait until 2<sup>nd</sup> gate precharges before it can start precharging,



 Precharge is rippling (not a parallel process anymore) – shortcircuit current. A solution is to delay the clock for each stage 132
 © Digital Integrated Circuits<sup>2nd</sup>
 Combinational Circuits

### **Domino Manchester Carry Chain**



#### © Digital Integrated Circuits<sup>2nd</sup>

Combinational Circuits

### **Domino Comparator**



### **Domino Zero Detector**



#### Dealing with Noninverting Property of Domino: #1. De Morgan's law

- Domino logic: only non-inverting logic can be implemented. This has limited usage of pure domino logic.
- Solution: 1). Reorganizing the logic using simple Boolean transforms (e.g. De Morgan's law). But it may not be always possible.



© Digital Integrated Circuits<sup>2nd</sup>

© Digital Integrated Circuits<sup>2nd</sup>

**Combinational Circuits** 

#### Dealing with non-inverting property of Domino: #2. Differential (Dual Rail) Domino

- Solution 2: Differential (dual-rail) Domino. Similar to DCVSL, can be used to implement any arbitrary function.
- ✓ More power due to guaranteed transition every clock cycle regardless input patterns - either out or out' must make a 0→1 transition.



### Variations: Multiple-output Domino

- Multiple-output domino logic: to reduce transistor count. It exploits the fact that certain outputs are subsets of other outputs to generate a number of logic functions in a single gate.
- E.g. In following circuit, O3=C+D is used in all 3 outputs, so it is implemented at the bottom of PDN. O2 and O1 reuse O3 without re-implementing it.



### Variations: Compound Domino

- Compound domino: to reduced transistor count. It combines outputs of multiple dynamic gates with a complex static CMOS gate.
- Ex: In following circuit, O1=(ABC)', O2=(DEF)', O3=(GH)', Final output: O=[(O1+O2)O3]'=ABCDEF+GH → reduced fan-in, faster speed.



#### Cascading Dynamic Gates #2: np-CMOS (Zipper)

Cascading dynamic gates solution 2: np-CMOS – exploits the duality between Φn and Φp networks to eliminate cascading problem. If Φn gates are controlled by CLK, Φp gates are controlled by CLK', then Φn gates can directly drive Φp gates, and vice verse.



### Cascading Dynamic Gates: np-CMOS (Zipper)

- np-CMOS: Cascade Φn and Φp networks alternately without extra inverters
- Precharge: CLK=0, Out1 from Φn is precharged to Vdd, it turns off PMOS in next Φp network; Out2 from Φp is predischarged to Gnd, it turns off NMOS in next Φn network.
- ✓ Evaluation: CLK=1, PMOS in next Φp network either remains off or turned on as needed → no inadvertent charging. Same for NMOS in next Φn network.



## np-CMOS Adder Circuit



## np-CMOS (Zipper)

- □ Limitation of np-CMOS:
- P-tree blocks are slower than n-tree modules (PMOS is slower than NMOS given the same size). Equalizing propagation delay requires extra area.
- Lack of buffers requires dynamic nodes are routed between gates: not good.



### How to Choose a Logic Style

Must consider ease of design, robustness (noise immunity), area, speed, power, system clocking requirements, fan-out, functionality, ease of testing *4-input NAND* 

| Style       | # Trans | Difficulty | Ratioed? | Delay | Power   |
|-------------|---------|------------|----------|-------|---------|
| Static CMOS | 8       | 1          | no       | 3     | 1       |
| CPL*        | 12 + 2  | 2          | no       | 4     | 3       |
| domino      | 6 + 2   | 4          | no       | 2     | 2 + clk |
| DCVSL*      | 10      | 3          | yes      | 1     | 4       |

\* Dual Rail

Current trend is towards an increased use of complementary static CMOS: design support through DA tools, robust, more amenable to voltage scaling.